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Abstract 



In a recent work we have introduced a novel approach to study the effect of weak 
non-linearities in the transfer function on the information transmitted by an analogue 
channel, by means of a perturbative diagrammatic expansion. We extend here the 
analysis to all orders in perturbation theory, which allows us to release any constraint 
concerning the magnitude of the expansion parameter and to establish the rules to 
calculate easily the contribution at any order. As an example we explicitly compute the 
information up to the second order in the non-linearity, in presence of random gaussian 
connectivities and in the limit when the output noise is not small. We analyze the first 
and second order contributions to the mutual information as a function of the non- 
linearity and of the number of output units. We believe that an extensive application 
of our method via the analysis of the different contributions at distinct orders might 
be able to fill a gap between well known analytical results obtained for linear channels 
and the non trivial treatments which are required to study highly non-linear channels. 
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1 Introduction 



Several recent investigations have explored the efficiency of two-layer networks in trans- 
mitting information, given that the distribution of the input layer and the input-output 
transformation are known [1], ^ ||], |5|, ^, [7|, [| [|. Some of these works have been inspired 
by processes involving populations of real neurons in the brain ]l|, ^ ||. There are experi- 
mental evidences that real neurons operate a highly non-linear transformation of the inputs, 
whose main features can be well captured by a threshold-linear function. Moreover this type 
of transformation allows an easier analytical treatment, at least under the assumption of 
replica symmetry |I|, 0, H. A sigmoidal transfer function has been proposed in other works 



and in the more general context of neural networks |L0| |H| . Yet this choice makes the ana- 
lytical treatment much more difficult. 

In a more theoretical framework, recent studies have explored the communication proper- 
ties of linear and binary channels, where the transfer function is either purely linear |4], [5] , or 
highly non-linear || |7], || . The analytical solution in case of pure linearity is straightforward, 
while in presence of a step transfer function one must resort to some approximations like 
replica symmetry, or restrict oneself to some particular regions in the parameter space. 

No extensive study has been performed yet, trying to bridge these two limit cases, pure 
linearity or strong non-linearity in the transfer function, with respect to the impact on the 
information content, i.e. on the channel efficiency. 

In a very recent study |9|], the contribution of small non-linearities to the mutual informa- 
tion has been evaluated in case of a gaussian noisy channel, introducing a novel approach by 
means of a perturbative expansion and providing an elegant interpretation in terms of feyn- 
man diagrams ||12|| . An analytical expression for the mutual information has been obtained 
at first order in the non-linearity. 

Here we extend the previous analyses to all orders in perturbation theory, deriving both 
the analytical expansion and the diagrammatic formalism necessary to evaluate the contri- 
bution to the mutual information at each order. Then we apply our method and quantify 
the first and second order contributions to the mutual information in the case of random 
gaussian connectivities and in the limit of large output noise. 

Even though not motivated by any particular hardware or biological application, our 
study is an attempt to fill a theoretical gap between purely linear and strongly non-linear 
channels, at least in the case of gaussian units. 

We believe that an extensive application of our expansion will allow to examine the 
impact of non-linearities on the information transmission and its modulation with the noise 
and the other parameters in the model. This will be the object of future investigations. 



2 The model 

The network model is analogous to the one used in M. 

The distribution of the N continuous inputs x={%i...x^} is gaussian with correlation 
matrix C, and each input signal is corrupted by uncorrelated gaussian noise v={ui..u^}, as 
in the following: 
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(xi) = 0, 
(XiXj) = [C]ij, Wi,j e 1, 2, ..N. 



(1) 
(2) 



(fi) = 0, (3) 
(^■) = 6 <%, Vi, j G 1,2, ..JV. (4) 

Each output unit performs a linear summation of the noisy inputs a; + via the matrix 
of the connectivities J = {Jij}', the result is transformed via a non-linear transfer function 
G: 

V = G{J{x + u)) + 2, (5) 

where z={z\..z N } is uncorrelated gaussian noise affecting the output, according to the fol- 
lowing distribution: 

(Zi) = 0, (6) 
(z iZj ) = 6(J y ,Vi,jel,2,..JV. (7) 

In analogy to the case examined in 0, we consider a sigmoidal shaped transfer function 
assuming that the non-linearity is small, so that a Taylor expansion can be performed: 



„2k+l 

G(x) = th(x) = £(-1) 



2 ^ + ! 

For small values of x one can stop at the first non-linear term, the cubic one. To illustrate 
the method we will first focus on a cubic non-linearity for the sake of simplicity. Nonetheless, 
as it will be clear in the following, out method is equally applicable to any term in the 
expansion of the hyperbolic tangent, finally allowing to reconstruct the whole series. 
In a cubic approximation eq.@ can be rewritten as: 

V ~h + g(h) + z, (9) 

with: 

g(h) = g h 3 , 

h = J(x + v). (10) 



3 A perturbative approach to the mutual information 

In analogy to what is reported in we express the mutual information between input and 
output as the difference between the output entropy and the equivocation between input and 
output^: 

^^Using the natural logarithm we implicitly measure the information in natural numbers. Conversion to 
bits is easily obtained dividing the mutual information by In 2: 
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7 = JdxJ dVP(x, V) In p^p^ = H(V) - (H(V\x)) x ; (11) 
H(V) = - J dVP(V)\nP(V) (12) 

(H(V\x)) x = - J dx J dVP(x)P(V\x)\nP(V\x). (13) 

The probabilities P(V), P(x), P(V\x) are fixed by eqs.(0), (U),®; in particular explicit 
expressions for both P(V) and P(V\x) can be obtained explicitating the dependence of V 
on h. One has: 



P(V) = I dhP(h)P(V\h); (14) 
P(V\x) = J dhP(h\x)P(V\h); (15) 
P (y\h) = -=L= ■ e -(v-h-g(h)) 2 /2 6 _ (16) 

where h=J(x + i/). Given the relationships (0), (f|) it is trivial to derive that both P(h) and 
P{h\x) = P{h — Jx) are gaussian distributed: 

-|h(A)" 1 h 

P(h) = ; A = JCJ T + 6 7J T , (17) 

f (2n) N detA 



e -|(h-Jx)(S)- 1 (h-Jx) 

P(fc|a!) = ; B = b JJ T . (18) 

i(2>ir) N detB 



It is just the presence of only gaussian averages that will finally allow us to express the 
mutual information as a series of Feynman diagrams by means of the Wick theorem. 
Further details can be found in 0. 

3.1 Perturbative expansion for the output entropy 

Let us consider eq . fli~2l) . When the non-linear term g(h) is non zero, integration on h cannot 
be performed without resorting to some approximation. In B it has been shown that an 
expansion up to first order in g(h) allows to perform the integration and derive an analytical 
expression for the mutual information. Our purpose here is to extend the previous analysis, 
considering all terms in the expansion, of whatever order in g(h). 
Let us consider the following equalities: 



P(V) = J dhP{h)P {V\h)e A/b = J dhP(h)P (V\h) 



' 00 A' 
V — 

1=0 u 



(19) 
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where 



Po(V\h) 
A 



V2nb N 



-(V-h) 2 /26 



A(V,h) = (V-h)-g(h) 



9(hf 



(20) 
(21) 



and we have performed the perturbation expansion in terms of A, keeping in mind that an 
explicit expression in terms of powers of g can be extracted a posteriori. 

Inserting the expansion in powers of A, eq. (|T9|) , in eq.(^) it can be shown that the 
output entropy can be expressed as follows: 



H(V) 



Hn(V) 



S iv \T: 



oo (A 



lib 1 



lnPo(V) 



+ jdvp (v)Y: 

where we have used the notation: 



1=1 

(_1)»+1 / * 



A 



(22) 



dhP(h)P (V\h)A l = / dhP(h)P {V\h) 



A' 



Hq(V) is the output entropy when g = 0: 



[V-h)- g{h) 



9(hf 



H (V) = JdVP (V) lnPo(V). 



From eqs.(|2 



e -iV(A+bI)" 1 V 



it is easy to derive that: 

P (V) = I dhP(h)P (V\h) 

J ^/(2n) N det(A + bl) 

which allows to derive an explicit expression for Hq(V): 

N 1 
H o(V) = — [l + \n2n] + -det(A + bI). 



(23) 



(24) 



(25) 



(26) 



In the limit when g is very small and go ^ b one can stop at the first order in A, 
neglecting the second order term in g. The expression for the output entropy becomes: 



H(V) ~ H (V) -\jdhj dVP(h)P (V\h)g(h)(V - h) lnP (V). 



(27) 



In this simpler it has been shown in , integration is straightforward, leading to 

an explicit final expression for the output entropy, as well as for the mutual information. Here 
we will show that it is possible to generalize our approach to every order in perturbation 
theory, establishing the basic rules to identify each integral with a diagram, so that the 
analytical evaluation is reduced to a combinatorial problem via the use of Wick theorem 
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Let us reconsider eq.(P3|). A change of variables allows to reduce the integration on h to 
a gaussian: 



h 



i + \dv- 



A' 



where 



P (V) J d 1 P{ 1 ) g Q {v- 1 - ±DV) (7 + \DV^ - f (7 + ^v) 6 (28 



P(rr) = e 2 ; d- 1 =- r 1 /r 1 -/. 



\2Ti) N detD 

Inserting this expression in eq.(^) one obtains: 



(29) 



H(V) = H (V) + 



2 US n» 



v- 7 -iw)( 7 + iw) -f ( 7 + iw 



+ /g (-l) n+1 ^ 



\n=2 



n(n 



1) \g W 



[A + bl]- 1 V) 
7 ' y 



v-,-\dv)( 1 + \dv) -%{i + \dv 



61 1 



(30) 



7' V 



where 



(F( 7 ,V)) 7 = J djP(j)F( 7 ,V), 

(F(j,V)) v = J dVP (V)F(~f,V). (31) 
P(7),Po(V) are given respectively by eqs.(|2"9"|), (EF) and we have explicitely put g(h) = goh 3 . 



3.2 Pert ur bat ive expansion for the equivocation 

A very similar expression can be obtained for the equivocation, by means of the same per- 
turbative approach. 

Let us reconsider eqs. (|l3D ,(|l6D. Since P(V\x) can be expressed as in integral on the 
distribution P(V\h), we can use the same perturbative expansion already introduced for 
P(V) in eq.©: 



P(V\x) = J dhP(h\x)P (V\h)e A/b = J dhP(h\x)P (V\h) 



00 A i 



U=0 



where P (V\h) and A are given in eqs.fl2"0D,(]2~I|) and P{h\x) is given in eq.flTSp. 
Introducing eq. (|32]) in the expression for the equivocation one obtains: 



A 1 



>H(V\x)) x = (H (V\x)) x -JdxP(x)jdV E^7|^ UnPo(V\x)) 



+ I dx j dVP (V\x) f] 

^ ^ n=2 



n 



1 \n+l I 00 



A' 



h I x 



n-1) \f^P (V 



x 



lib 1 



(32) 



(33) 



where we have used the notation: 



A 1 ) = J dhP(h\x)P (V\h)A l = J dhP{h\x)P {V\h) 



[V-h). g(h) 



e _i( V -Jx)(B+6/)- 1 (V-Jx) 



(34) 



and (H (V\x)) x is the equivocation when g = 0: 

(H (V\x)) x = Jdxj dVP(aj)Po(V|aj)lnP (V|aj), (35) 



P (V\x) = (36) 
/ {2n) N det{B + bl) 



Then it is easy to derive the final expression for the zeroth order equivocation: 

(H (V\x)) x =j(l + \n27r) + ~lndet(B + bI). (37) 

On the other hand, if one keeps the contributions to the information up to first order in 
go one obtains: 



(H(V\x)) x ~{H (V\x)) x - 1 JdhJdxjdVP(x)P(h\x)P (V\h)g(h)(V-h) InP (V|aj). 

(38) 

Eg . (|33|) can be treated in a very similar way as already done in the case of the output 
entropy, eg. fl2"!2|) . Since three different integrations are present in eq.(|3"3"D diagonalization of 
the Gaussian distributions requires replacing both h and V in sequence: 



V 



B^Jx I D, D 



-i 



B 



-i , 1,-1 



b~ L L 



h — > 7 + 
V — > u + Jx. 
By means of these substitutions eq.(|33"D can be rewritten as follows: 



(39) 



(H(V\x)) x = (H (V\x)) x + 



oo J, 



\i=i 

oo 



u — 7 — -Dui~f+-Du+ Jx] (~f+-Du + Jx 

b A b J 2 V b 



u[B + bI] l u 



7,x 



u — 7 — Dti) ( ; y+-Du+ Jx] — — (~f+-Du + Jx 

b J V J 2 V b 



where 

(F{l,u,x))tj = J djP(j)F(j,u,x) 
(F(fjf,u,x)) v = J duP(u)F(j,u,x) 



-ih 



) ) (40) 

7' u,x 



(41) 



P(l) = % (42) 
sJ(2n) N detD 

-^(B+bl)-^ 

P{u) = (43) 
^f{2n) N det{B + bl) 

As it is evident from eqs. (|30D , (|40|) the mutual information is expressed as series of Gaus- 
sian averages, where all powers higher than the second one can be treated via the Wick 
theorem. This allows to establish the basic rules for a diagrammatic expansion in terms of 
Feynman graphs, which is the subject of the next sections. 



4 A diagrammatic formalism for the expansion 

It is well known that higher order moments of a Gaussian distribution can be reduced 
to a series of products of second order moments via the use of Wick's theorem, both in 



classical and in quantum systems |12| . The introduction of a diagrammatic formalism allows 
to associate a graph to each type of integral. Therefore the whole series can be expressed in 
a very compact and elegant way and integrations can be performed symbolically contracting 
all the lines pair by pair, in such a way to obtain all the topologically distinct diagrams. Since 
each contraction is accociated to a precise numerical value, the value of each diagram can 
be easily calculated by simply multiplying all the factors corresponding to the contractions 
of the lines. 

In the previous work ||, where we have focused on the first order terms in g, in eqs. (|3~0|) ,( |i0[ 
we have identified the basic elements which might allow to build a diagrammatic expansion 
for the mutual information, up to first order. Here we generalize our approach and we show 
that a diagrammatic interpretation can be provided for any order in the expansion, establish- 
ing the basic elements and rules distinctly for the output entropy and for the equivocation. 



4.1 Diagrammatic rules for the evaluation of the output entropy 

Since two distinct averages characterize eq. (PH|) , namely on -y and V, one is naturally tempted 
to introduce two distinct symbolic lines for 7 and V. Yet two distinct objects are to be 
contracted and integrated on V, namely V itself and DV. Therefore it is more convenient 
to introduce two distinct lines for V and DV. Since either of this two lines can be contracted 
with itself or with the other one, one has three distinct rules on contraction corresponding 
to integrating different objects on V. The whole prescription can be given as follows: 



1. Each term Vi is represented by a straight line { « 

Vi 

2. Each term (DV)i is represented by a crossed solid line { • ^ 

(DV)i 

3. Each term 7, is represented by a wiggly line { ^y-^r^r^r^, 

li 

4. Each matrix element [A + bl]^ 1 is represented by a dashed romboid 
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5. The integration of the product ViVj over V corresponds to the contraction of two 
straight lines coming out of vertices 

This produces a matrix element (A + bl)ij 



i •- 



6. The integration of the product Vi(DV)j/b over V corresponds to the contraction of 
a straight line coming out of the vertex i with a straight crossed line coming out of 
vertex j. 

This produces a matrix element [(A + bI)D]y/b { #- 



-X- 



7. The integration of the product (DV)i(DV)j/b 2 over V corresponds to the contraction 
of two straight crossed line coming out of the vertices 

This produces a matrix element [D(A + bI)D T ]ij/b 2 { • — x — X — • j 

8. The integration of the product 7^ over 7 corresponds to the contraction of two wiggly 
lines coming out of vertices 

This produces a matrix element { v^r^-s^s^m j 



Eq.(|30|) can be written in terms of these symbols: 



/. . \3 90 ( . 

(•~~+ i«-K— ) - yd 

+ ((£ ( } ^ "° 



\n=2 



n n 



1) \g Z!6<? [( 



•X- 



2 



'+ i«-K— ) - yd 



■+ z «-X- 



(44) 



where we have symbolically separated the average across V from the average across DV Xo 
remind that contractions have to be performed on both objects according to the rules given 
above. 



4.2 Diagrammatic rules for the evaluation of the equivocation 



A diagrammatic interpretation can be given for eq. (|40| ) introducing proper symbols and rules 
for the contractions: 



1. Each term is represented by a double straight line { 



■w. 



2. Each term (Du),Jb is represented by a crossed double straight line 

3. Each term % is represented by a double wiggly line { ^y^r^er^. 



(Du)i/b 



4. Each term (Jx)i is represented by a dotted line { • 
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li 



(Jx)i 



5. Each matrix element [B + bl}^ 1 is represented by a filled romboid 1 3 

6. The integration of the product u(iij over u corresponds to the contraction of two double 
straight lines coming out of vertices 

This produces a matrix element (B + bl)ij { » — » j 

7. The integration of the product Ui(Du)j/b over u corresponds to the contraction of 
a double straight line coming out of the vertex i with a double straight crossed line 
coming out of vertex j. 

This produces a matrix element Bij i » x — » j 

8. The integration of the product (Du)i(Du)j over u corresponds to the contraction of 
two crossed double straight lines coming out of the vertices 

This produces a matrix element [B(B + bl)~ l B T \ij { • m y. — • j 

9. the integration of the product 'y^j over 7 corresponds to the contraction of two double 
wiggly lines coming out of vertices 

This produces a matrix element { •=^=^^?5a??a» j 

10. The integration of the product (Jx)i(Jx)j over x corresponds to the contraction of 
two dotted lines coming out of the vertices 

This produces a matrix element [JCJ T ]ij 1 • • j 

Eq.p0|) can be written in this formalism: 

(H(V\x)) x = (H (V\x)) x + !////£ A]T [(, _ _ j i 

\ \ \ \ I — 1 2 J /c 



• + z • X + i • •) - — ( i v&i^ 1 •— X h / • 



00 / 



\6 



/•= /«^= /< 



Eqs.(^4|),(^5D constitute the final expression for the mutual information at every order. 
Application of the Wick's theorem and of the contraction rules allows to analytically derive 
the contribution to the mutual information at each order in g. 



The first order approximation in g , studied in ||, can be easily obtained from eqs.(|5|),([44]) 
putting I = 1 and neglecting the double summation over the indices I and n, which gives 
corrections only at orders higher than first one in g . 
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5 A detailed analysis of the different contributions to 
the mutual information 



The expansion we have derived allows to investigate how the different orders contribute to 
the mutual information, varying the expansion parameter g . 

The expression of the mutual information at the zeroth order can be easily derived from 
eqs.©,©: 



Jo = H (V) - (H (V\x)) x 



- In det ( — 7 
2 I B + bl 



(46) 



The first order approximation has been a primary object of investigation in ||, where a 
diagrammatic interpretation has been provided, as well. From eqs.(|27|),(|38|) it can be shown 
that the final expression of the first order contribution to the mutual information can be 
written as follows: 



h = -Sg E M\A + bl&Aj -[B + bl^B 



(47) 



Further details about the derivation of the first order approximation can be found in ||]. 

We now focus on the second order contributions. Let us reconsider eqs.([44[) and (f|^). It 
is clear that the expansion in I, n is not a direct expansion in go : the first order in / contains 
both first and second order terms in go . Therefore one must be careful and extract all the 
second order terms in g from the proper powers in l,n. In particular, in the expansion for 
both, the output entropy and the equivocation, one has to put I — 1, 2 in the first sum and 
n = 2, 1 = 1 in the second sum, and then retain only the second order terms. It can be shown 
that: 



•X- 




i • X )(i 



i «-X- 



(48) 



•X-J 



(H2(V\x)) x 
9l fl 




+ % 



(49) 
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where we have extracted the terms which are order g 2 . 

The computation of all the second order diagrams is very long, since it involves sixth 
powers of terms and 4 different types of contractions, both in the equivocation and in the 
output entropy. Yet the calculation results much simplified in some limits. 

In particular, let us consider the case where go <C b but b is not too small. Both the 
equivocation and the output entropy contain terms of order g 2 /b 2 up to g^b 2 . In fact, looking 
at the contraction rules given above, one can notice that the contraction of two wiggly lines 
produces a matrix element D^, where the matrix D is order b, while the other contractions 
produce elements which are order 1 with respect to b. Therefore the dominating contributions 
in the limit when b is not too small are given by the diagrams where 3 or 4 wiggly loops 
appear. Keeping only these diagrams, after having performed all the contractions, it can be 
shown that the dominating second order contribution to the output entropy is given by the 
following sum of diagrams: 



H J V ) ~ -M 

2y ' 2 b 2 



which corresponds to the following analytical expression: 

9 




(50) 



Tr (A {A + biy 1 )' 



(51) 



Under the same assumption the second order contribution to the equivocation can also be 
expressed as a simple sum of diagrams: 



b 2 




(52) 



After some elementary manipulation of the matrices it is easy to show that the analytical 
expression for the equivocation reads: 



(H(V\x)) x ~ -gib 2 



^-Tr (B (B + bl)^ + 7Tr(B{B + fo/) -1 )' 



(53) 



As a whole, the second order contribution to the mutual information in the limit when b is 
not too small can be written: 



h ^ 3g 2 b 2 I^Tr [b (B + biy 1 )' + 7 -Tr (B (B + biy 1 )" - °- \Tr (A (A + biy 1 ) 

^ ' (54) 

Since the evaluation of the mutual information has been carried out for a generic connec- 
tivity matrix {J^} it is obvious that both the total information value and its order-specific 



i\4 3 
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5 - 1 order in g Q 
2 nd order in g 



0.0°0001 0.0001 0.001 0.01 0.1 

% 

Figure 1: Mutual information as a function of the parameter go, comparing the zeroth, first 
and second order approximation. N = 100;&o — 0.1; b = 0.6. The correlation matrix C is 
equal to the identity and the connectivities Jij linking each output unit i to the input neurons 
j are randomly chosen from a Gaussian distribution with mean zero and standard deviation 

i/Vn 

contributions will depend on the structure of the connectivities. Even without aiming at 
a generalization we can in any case provide a quantification of the different contributions 
to the mutual information restricting ourselves to the specific case where the connectivities 
linking each output neuron to the input ones are drawn from a gaussian distribution with 
mean zero and standard deviation 1 / y/N. As it is known from the theory of spin glasses and 
neural networks, this renormalization ensures that the local field hi acting on each unit i is 
finite when N is large. 

Fig. ([TJ) shows the mutual information in zeroth, first and second order approximation for 
increasing values of go. Two main observations stem from the analysis of the curves: 

• both the first and the second order contributions in the non-linearity lower the infor- 
mation value with respect to the linear network. 

• our approximations start to lose their validity for values of go larger than 0.01; this 
does not automatically mean that one should add higher order contributions, since 
we have neglected second order contributions with powers of b lower than b 2 : when g 
becomes not much smaller than b one should probably include the other contributions, 
at second order in g . 

Fig.(|2]) shows a detail of the previous plot, where the linear and quadratic fit in g are 
more evident. 

Fig.(|3]) shows the mutual information as a function of the population size N. As it is clear 
from eqs.([|6|),(f|7|),([54|), the zeroth order approximation is linear in N, and so is also the first 
order contribution, since it is the difference between two scalar products of iV-dimensional 
vectors. On the other hand the second order contribution is roughly quadratic in iV: in fact 
a numerical check of the three different contributions shows that the first two terms, which 



14 



36 r 




0.01 0.02 0.03 0.04 0.05 



% 

Figure 2: Mutual information as a function of the parameter go, as in fig.(^]). N = 100;&o = 
0.1; b = 0.6. The correlation matrix C is equal to the identity and the connectivities Jij 
linking each output unit i to the input neurons j are randomly chosen from a Gaussian 
distribution with mean zero and standard deviation 1 / \/N 

are linear in N (traces of iV-dim matrices) are three orders of magnitudes smaller than the 
third term, which is quadratic in N (square of a trace of an iV-dim matrix). 

6 Conclusions 

We have presented here a systematic approach to quantify the effect of small non-linearities 
in the transfer function on the information transferred by a two layer network of analogue 
units. We have derived a perturbative expansion in the non-linearity parameter g , providing 
an elegant interpretation in terms of Feynman diagrams. 

In a previous report [[J we had already quantified the contribution to the information 
at first order in g . Here we have extended the previous results providing an analytical 
expression to calculate the contributions at each order in perturbation theory. Moreover our 
method can be easily applied to any structure of the connectivities, with no restriction to a 
specific architecture. 

As an example, we have quantified the zeroth, first and second order contributions to the 
information in the case of random, Gaussian distributed couplings and in the limit when the 
output noise b is not small. We have found that the main effect of the non-linearity for this 
particular architecture is a loss in information, detected both at first and at second order in 
go- This result is in agreement with previous investigations where it has been shown that 
two main causes of information loss in a two-layer network with random gaussian couplings 
are a non-linearity in the transfer function, like the presence of a threshold, and a large 
output noise. 

The detailed analysis we have presented here applies to the particular case of cubic 
non-linearities. Yet, as already remarked in ||, our method can be easily adapted to any 
non-linearity of a generic power 2k + 1: it is enough to replace the third and sixth powers 
appearing in eqs.([0]) and (|45| ) respectively with powers 2k + 1 and 4k + 2. 
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Figure 3: Mutual information as a function of the population size N, comparing the zeroth, 
first and second order approximations in go- 9o — 0.05;&o = 0.1; b = 0.6. The correlation 
matrix C is equal to the identity and the connectivities Jy linking each output unit i to 
the input neurons j are randomly chosen from a Gaussian distribution with mean zero and 



Finally this allows to deal with highly non-linear functions, like a hyperbolic tangent, 
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